Corpora of Speech and Text Data Now Available

The NC State University Libraries now provides access to several corpora of speech and text data for non-commercial use; these corpora may be especially appealing to those doing natural language processing and linguistics research.

The Linguistic Data Consortium (LDC) is an open consortium of universities, libraries, corporations, and government research laboratories that creates and distributes a wide array of language resources. The Libraries had a Standard Membership for 2019 and via that membership, as well as individual purchases, acquired perpetual access to five corpora from LDC's catalog.

The five corpora are:

Questions about this data, as well as requests for access to other corpora released in 2019, can be sent to Emily Cox, Collections & Research Librarian for Humanities, Social Sciences, & Digital Media. Purchase suggestions can be submitted using our Suggest a Purchase form.